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NMR spectroscopy is unique as it provides a means to study 
bio-molecules with atomic resolution in a near natural 
environment. Traditionally, NMR spectroscopic analysis of 
structures, interactions, and dynamics has been reserved for 
molecular complexes that are smaller than 20 kDa. However, 
in recent years, the introduction of TROSY techniques,'^' 
protein deuteration,'^' and selective methyl-group labeling'^' 
has significantly extended this molecular weight limit. '■*' 
Indeed, systems far over 100 kDa have been analyzed in 
great detail, revealing unique functional aspects of large 
molecular machines.'''' 

Many NMR spectroscopic studies on large systems have 
been performed on highly symmetric complexes, as these 
assemblies are relatively easy to prepare and result in simple 
NMR spectra in which the resonance signals from all the 
subunits are identical.''^' For large asymmetric assemblies that 
can be produced in E. coli by co-expression of all the 
components,''' spectral crowding will lead to NMR spectra 
that can no longer be analyzed in detail. In a limited number 
of cases this crowding could be circumvented by in vitro 
reconstitution of the complex from separately expressed 
NMR active and NMR inactive subunits.'"'-^ ''' This strategy is, 
however, not generally applicable. As a result, most eukary- 
otic systems that are much more complex than their bacterial 
or archaeal counterparts will remain inaccessible to high- 
resolution NMR studies. 

Herein, we introduce a sequential co-expression method 
for the preparation of large asymmetric complexes that 
combines the advantages of in vivo reconstitution and the 
benefits of partial NMR isotope labeling to reduce NMR 
spectral complexity. We transform E. coli cells with two 
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plasmids carrying different promoters so that protein expres- 
sion can be induced independently. In this manner, it is 
possible to induce protein synthesis for one set of proteins in 
an NMR active medium (stage 1), whereas a second set of 
proteins can be produced in an NMR invisible medium 
(stage 2; Figure 1 A). As all expressed proteins are present in 
a single E. coli cell, the final complex can assemble in 
a cellular environment preventing the aggregation of subunits 
that are otherwise instable in isolation. We refer to our 
method to "label, express, and generate oligomers" for NMR 
as "LEGO-NMR". 

The LEGO method requires tightly controlled individual 
DNA promoters such that the promoter that induces protein 
expression in stage 1 is completely switched off in stage 2, 
whereas the promoter for stage 2 is not active in stage 1. In 
LEGO methods Al and A2 (Supporting Information, Fig- 
ure SI), protein production is induced from an araBAD 
promoter using arabinose in stage 1 and from a T7 promoter 
using IPTG in the stage 2.'^' In this case, the glucose that is 
present in stage 2 efficiently turns off the araBAD pro- 
moter.'^' In LEGO method B, we introduce a three-promoter 
system, where protein production is induced from a T7 
promoter in stage 1, and from an araBAD promoter in 
stage 2. In this case, the T7 promoter is actively switched off 
by the expression of T7 lysozyme in between stage 1 and 
stage 2 from a third plasmid that contains a rhamnose 
inducible promoter.''' This inhibition is required as T7 
expression would otherwise continue for over 4 h after the 
removal of IPTG from the growth medium.'^"' 

To establish the LEGO-NMR methodology, we use two 
different LSm complexes that play a role in mRNA degra- 
dation and pre-mRNA splicing. The LSml-7 complex''"' 
(containing the LSml to LSm7 proteins) and LSm2-8 
complex'"' (containing the LSm2 to LSm8 proteins) contain 
seven different protein chains that are arranged in a unique 
order.''"' ''' As most LSm proteins are insoluble in isolation, 
neither the LSml-7 nor the LSm2-8 complex can be 
efficiently reconstituted in vitro from separately expressed 
proteins.''''' On the other hand, co-expression of the different 
LSm proteins yields homogeneous NMR samples (Fig- 
ure IB), showing that in-cell reconstitution functions effi- 
ciently. However, owing to the large number of unique 
resonances (649 expected backbone amide signals) the 
resulting NMR spectra suffer significantly from spectral 
overlap (Figure IB), preventing an accurate analysis. LSm 
complexes are thus a good example of eukaryotic protein 
complexes that are currently not accessible for detailed high- 
resolution NMR spectroscopic techniques. 

To reduce the spectral overlap for the LSm2-8 complex by 
a factor of approximately two, we labeled the LSm5, LSm6, 
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Figure i. Principle of LEGO-NMR spectroscopy: A) shown is method Al (Table S2, Figure SI). In stage 1 the £. coli cells are grown in NMR active 
medium and protein synthesis is induced from plasmid 1 , resulting in the production of NMR active LSm5, LSm6, and LSm7. In stage 2, the cells 
are transferred to an NMR inactive medium, where protein synthesis from plasmid 2 is induced, resulting in the production of NMR invisible 
LSm2, LSm3, LSm4, and LSm8. By addition of a purification tag to a single subunit of the complex, the intact complex can be straightforwardly 
isolated and purified to homogeneity NMR active subunits are colored, whereas NMR invisible subunits are in gray. B) ^H-"N TROSY NMR 
spectrum (gray) of the uniformly NMR active LSm2-8. Especially the central region suffers from severe resonance overlap, complicating spectral 
analysis significantly C) Top left: LEGO ^H-"N NMR spectrum (black) of the LSm2-8 complex, in which LSm5, LSm6, and LSm7 are NMR active 
and LSm2, LSm3, LSm4, and LSm8 are NMR invisible. Other panels: LEGO ^H-"N NMR spectra of the LSm2-8 complex in which individual LSm 
subunits are NMR active in an otherwise NMR inactive background. The resonance signals in the spectra display a subset of the resonance 
signals observed in the fully labeled complex (see enlargements). 



and LSm7 proteins with '''N in stage 1, whereas the LSm2, 
LSm3, LSm4, and LSm8 proteins were produced in an NMR 
inactive medium in stage 2. The resulting spectrum of the 
LSm2-8 complex that only displays LSm5, LSm6, and LSm7 
is significantly simplified (Figure 1 C, top left). Importantly, 
a very good overlay of a subset of the resonances of the fully 
NMR active LSm2-8 complex is observed, as we intended to 
achieve. This situation clearly allows for the identification of 
the resonances in the LSm2-8 complex that result from the 
LSm5, LSm6, and LSm7 proteins. 

To establish the power of the sequential co-expression 
methodology further, we produced seven different NMR 
samples of the LSm2-8 complex, in which only a single LSm 
protein was ^'N-labeled in the stage 1, whereas the remaining 
six LSm proteins were expressed in an NMR inactive form in 
stage 2 (Figure 1 C). The seven spectra of the LSm2-8 
complex allow for the unambiguous identification of the 
resonance signals that result from each individual LSm 
protein in the LSm2-8 complex. In this manner, a simplifica- 
tion of 89 % can be achieved (74 expected amide signals in the 
LEGO LSm6 spectrum). Our approach is thus able to 
deconvolute the complicated spectrum of the hetero-hepta- 
meric complex into seven significantly simplified sub-spectra. 
At the same time, the overlay of the seven NMR spectra of 



the complexes that contain a single labeled LSm protein 
yields the spectrum of the uniformly labeled LSm2-8 complex 
(Figure S2). Note that the proteins that are produced in 
a deuterated form in stage 1 are efficiently re-protonated at 
the beginning of stage 2 before the individual subunits are 
incorporated in the final complex. This eliminates the need of 
(refolding) methods to re-protonate backbone amides (Fig- 
ure S3) in the LSm2-8 complex. 

The LSm2-8 complex is part of the U6 snRNP, where it 
interacts with the 3' end of the U6 snRNA.'"'" "' To establish 
which subunits in the LSm2-8 complex contact the RNA 
substrate, we performed NMR titration experiments with 
LSm2-8 LEGO complexes that either contained NMR active 
LSm2, LSm3, LSm4, and LSm8 (Figure S4A) or that con- 
tained NMR active LSm5, LSm6, and LSm7 (Figure S4B) in 
an otherwise NMR inactive background. In both complexes, 
we observed significant chemical shift perturbations upon 
complex formation with the RNA. Importantly, the single 
subunit LEGO spectra (Figure 1 C) establish that all seven 
LSm proteins are involved in RNA binding (Figure S4A,B) as 
resonance signals for all the LSm proteins experience 
chemical shift changes upon interaction with the RNA. To 
resolve the remaining spectral overlap, we performed an 
RNA titration experiment with an LSm2-8 complex that was 
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Figure 2. Assignment of LSm5 in the LSm2-8 complex. A) LSm5 in the 
LSm2-8 complex before (black) and after (pink) the addition of the 3' 
end of the U6 snRNA B) LSm5 resonance strips from an HNCA 
experiment that was recorded on a fully labeled LSm2-8 complex. The 
resonance strips could be selected based on the LSm5 LEGO-NMR 
spectrum (A, and Figure 1 C). C) LSm5 residues that experience the 
largest chemical shift changes upon addition of the U6 snRNA are 
marked as pink spheres on a model of the LSm2-8 complex. 

labeled at LSm5 only (Figure 2 A). We then combined 
information from tlie previously assigned LSm657 com- 
plex,'''"' the LSm5 LEGO spectrum (Figure 1 C) and an 
HNCA spectrum of a fully ^H, "C, '=N-Iabeled LSm2-8 
complex (Figure 2 B) to assign the LSm5 residues in the 
LSm2-8 complex that contact the RNA. In this case, we 
exploited the fact that we were able to select the LSm5 
resonance signals in the HNCA spectrum of the fully labeled 
LSm2-8 complex, thus reducing the number of expected 
resonance signals from 649 to 77, which significantly simpli- 
fied the assignment process. This approach revealed that the 



residues that experience large chemical shift perturbations 
upon interaction with the U6 snRNA are located in loop 5 of 
the LSm5 protein (Figure 2C). This loop connects p-strands 4 
and 5 in the LSm fold and lines the central pore of the LSm 
ring. As the RNA we used for the interaction experiments 
contains only nine bases and as ail the LSm proteins are 
involved in the RNA interaction (Figure S4) our data suggests 
that the RNA binding site in LSm2-8 is at the central pore. 
Additional information to support this observation can be 
obtained from the assignment of the other LSm proteins in 
the LSm2-8 complex in an analogous manner. Interestingly, 
the eukaryotic Sm complex,'^''! the archaeal LSm complex,'"'' 
and Hfq'^l have all been shown to use this region to interact 
with substrate RNA indicating that this binding site is 
conserved in the eukaryotic LSm complexes. 

Methyl TROSY spectroscopy has been shown to be highly 
suitable for the study of supramolecular complexes that are 
inaccessible to backbone-directed TROSY spectroscopy. To 
establish e-'H-'^'C methyl labeling of methionine residues in 
concert with LEGO-NMR, we used the hetero-heptameric 
LSml-7 complex whose 'H-'''N TROSY spectra are of lower 
quality compared to those of the LSm2-8 complex (Fig- 
ure S5). Methionine methyl TROSY spectra of the LSml-7 
complex, where all proteins are fully methionine labeled, 
display a large number of well resolved methyl resonances in 
addition to a region that suffers from significant spectral 
overlap (Figure 3 A, top left). To resolve the spectral overlap 
and to assign the well-resolved resonances to specific LSm 
proteins, we prepared seven different LEGO NMR samples 
of the LSml-7 ring. In each of these samples a single LSm 
protein was methionine labeled, whereas the other six LSm 
proteins were NMR invisible. Methyl TROSY spectra of 
these hetero-heptameric complexes allowed for the unambig- 
uous assignment of the methionine methyl groups to individ- 
ual LSm proteins (Figure 3 A). Site-specific assignment of 
these methyl groups can be made using a mutational 
approach. In addition, the "singly labeled" LSml-7 
rings significantly resolved the spectral overlap of the 
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Figure 3. Methyl-group labeling. A) Methionine methyl-group spectra of the LSml-7 complex. Top left: the methyl TROSY spectrum of an LSml-7 
complex in which all the LSm proteins are NMR active. This spectrum can be deconvoluted into seven simplified spectra that contain only 
a singly NMR-active LSm protein (other panels). B) Methyl TROSY spectra of lle-61 labeled LSm2-8 complexes. The spectrum of the fully 
isoleucine labeled spectrum (gray) is simplified by labeling only the LSm5, LSm6, and LSm7 proteins (black) or only the LSm5 protein (olive). 
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spectrum. Methionine methyl TROSY spectroscopy is thus 
fully compatible with the LEGO-NMR methodology and can 
provide high-resolution spectra for complexes that are not 
amenable to 'H/^N-based TROSY spectroscopy. Note that it 
has been shown recently that methionine methyl groups are 
excellent probes to study molecular interactions.'"' 

In addition to methionine methyl groups, methyl TROSY 
spectroscopy is often performed in concert with labeled 
methyl groups of isoleucine, leucine, valine,''"' or alanine'''' 
residues. As opposed to methionine labeling, these amino 
acids are incorporated into the protein through E. coli 
metabolization of specifically labeled precursor molecules. 
For isoleucine residues this is only possible in the presence of 
glucose as that induces catabolite repression that inhibits 
metabolic pathways that would otherwise degrade a-ketobu- 
tyric acid.'^-' Stage 1 in method Al, that we used for 
methionine and nitrogen labeling (Figure SI, Table S2), uses 
glycerol as a carbon source and can thus not be used for 
isoleucine labeling. To label selected subunits with isoleucine 
methyl groups we thus use method A2 (where the NMR 
labeling is moved from stage 1 to stage 2) or method B (where 
an arabinose-inducible vector is used in stage 2; Figure SI). 

The high quality of the spectrum of the fully isoleucine-81 
labeled spectrum reflects the strength of methyl-group label- 
ing for high molecular-weight complexes (Figure 3 B, gray). 
We then used method B to prepare a "half-labeled" LSm2-8 
LEGO complex that contains NMR active isoleucine-81 
methyl groups in LSm5, LSm6, and LSm7 (Figure 3B, black). 
As observed for the H,N-based spectra (Figure 1 C), a subset 
of the resonances that result from the labeled proteins can be 
readily identified. It is worth noting that the LSm5, LSm6, and 
LSm7 proteins contain 17 isoleucine residues, 16 of which 
yield well-dispersed resonance signals in the spectrum. To 
extend the LEGO approach one step further, we used 
method A2 to prepare an LSm2-8 complex, in which only 
LSm5 is NMR active (Figure 3 B, ohve). The resulting HMQC 
spectrum displays six distinct resonance signals that result 
from the six isoleucine residues that are present in the LSm5 
protein. 

In the examples shown above, we ensured that isotope 
labeling was restricted to a subset of the subunits in the 
complex, whereas the remaining subunits were NMR invis- 
ible. Interestingly, it is also possible to distribute different 
labeling schemes over the different subunits. We demonstrate 
this approach with an LSm2-8 complex that is uniformly "N 
labeled, LSm2, LSm3, LSm4, and LSmS methionine labeled, 
and LSmS, LSm6, and LSm7 isoleucine labeled (Figure 4). 
Owing to the spectral separation of methionine and isoleucine 
methyl groups this approach allows for the independent and 
simultaneous monitoring of NMR parameters from different 
parts of a large complex. 

NMR spectroscopic studies of large and asymmetric 
protein complexes suffer from significant challenges related 
to sample preparation and from spectral crowding owing to 
a high number of unique resonances. We have introduced 
a sequential co-expression strategy that tackles both issues 
simultaneously. Using the LSml-7 and LSm2-8 complexes, 
we show that highly homogeneous samples that contain only 
one NMR active subunit can be readily prepared. Impor- 
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Figure 4. Mixed "N, E-'H,"C-methyl methionine and 61-'H,"C-methyl 
isoleucine labeling of LSm2-8. The 'H-'^N TROSY spectrum (left) 
displays all subunits in the complex (see Figure 1 B). LSm2, LSm3, 
LSm4, and LSmS are labeled with methionine residues (see Figure 3 A, 
that displays the methionine methyl groups for LSm1-7). LSmS, LSm6, 
and LSm7 are labeled with isoleucine residues (see Figure 3 B). 



tantly, our strategy is compatible with backbone and methyl 
group side-chain labeling. LEGO-NMR is thus suitable for 
the study of large asymmetric complexes including eukaryotic 
systems that are currently inaccessible to detailed NMR 
analysis. Interestingly, around 50 % of the assemblies in the 
protein data bank (PDB) that contain three or more unique 
chains have been prepared in E. coli, indicating that our 
method is applicable to a wide variety of complexes. 
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